NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Beyond Additive Fusion: Learning Non-Additive Multimodal Interactions

Wörtwein, Torsten; Sheeber, Lisa; Allen, Nicholas; Cohn, Jeffrey; Morency, Louis-Philippe (January 2022, Findings of the Association for Computational Linguistics: EMNLP 2022)

Multimodal fusion addresses the problem of analyzing spoken words in the multimodal context, including visual expressions and prosodic cues. Even when multimodal models lead to performance improvements, it is often unclear whether bimodal and trimodal interactions are learned or whether modalities are processed independently of each other. We propose Multimodal Residual Optimization (MRO) to separate unimodal, bimodal, and trimodal interactions in a multimodal model. This improves interpretability as the multimodal interaction can be quantified. Inspired by Occam’s razor, the main intuition of MRO is that (simpler) unimodal contributions should be learned before learning (more complex) bimodal and trimodal interactions. For example, bimodal predictions should learn to correct the mistakes (residuals) of unimodal predictions, thereby letting the bimodal predictions focus on the remaining bimodal interactions. Empirically, we observe that MRO successfully separates unimodal, bimodal, and trimodal interactions while not degrading predictive performance. We complement our empirical results with a human perception study and observe that MRO learns multimodal interactions that align with human judgments.
more » « less
Full Text Available
Reconsidering the Duchenne Smile: Formalizing and Testing Hypotheses About Eye Constriction and Positive Emotion

https://doi.org/10.1007/s42761-020-00030-w

Girard, Jeffrey M.; Cohn, Jeffrey F.; Yin, Lijun; Morency, Louis-Philippe (March 2021, Affective Science)
null (Ed.)
Full Text Available
A Person- and Time-Varying Vector Autoregressive Model to Capture Interactive Infant-Mother Head Movement Dynamics

https://doi.org/10.1080/00273171.2020.1762065

Chen, Meng; Chow, Sy-Miin; Hammal, Zakia; Messinger, Daniel S.; Cohn, Jeffrey F. (January 2021, Multivariate Behavioral Research)

Full Text Available
Human-Guided Modality Informativeness for Affective States

https://doi.org/10.1145/3462244.3481004

Wörtwein, Torsten; Sheeber, Lisa B.; Allen, Nicholas; Cohn, Jeffrey F.; Morency, Louis-Philippe (January 2021, ACM International Conference on Multimodal Interaction)

This paper studies the hypothesis that not all modalities are always needed to predict affective states. We explore this hypothesis in the context of recognizing three affective states that have shown a relation to a future onset of depression: positive, aggressive, and dysphoric. In particular, we investigate three important modali- ties for face-to-face conversations: vision, language, and acoustic modality. We first perform a human study to better understand which subset of modalities people find informative, when recog- nizing three affective states. As a second contribution, we explore how these human annotations can guide automatic affect recog- nition systems to be more interpretable while not degrading their predictive performance. Our studies show that humans can reliably annotate modality informativeness. Further, we observe that guided models significantly improve interpretability, i.e., they attend to modalities similarly to how humans rate the modality informative- ness, while at the same time showing a slight increase in predictive performance.
more » « less
Full Text Available
Bag-of-Acoustic-Words for Mental Health Assessment: A Deep Autoencoding Approach

https://doi.org/10.21437/Interspeech.2019-3059

Du, Wenchao; Morency, Louis-Philippe; Cohn, Jeffrey; Black, Alan W. (September 2019, Proceedings of the Annual Conference of the International Speech Communication Association)

Full Text Available
FACS3D-Net: 3D Convolution based Spatiotemporal Representation for Action Unit Detection

https://doi.org/10.1109/ACII.2019.8925514

Yang, Le; Ertugrul, Itir Onal; Cohn, Jeffrey F.; Hammal, Zakia; Jiang, Dongmei; Sahli, Hichem (September 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII))

Most approaches to automatic facial action unit (AU) detection consider only spatial information and ignore AU dynamics. For humans, dynamics improves AU perception. Is same true for algorithms? To make use of AU dynamics, recent work in automated AU detection has proposed a sequential spatiotemporal approach: Model spatial information using a 2D CNN and then model temporal information using LSTM (Long-Short-Term Memory). Inspired by the experience of human FACS coders, we hypothesized that combining spatial and temporal information simultaneously would yield more powerful AU detection. To achieve this, we propose FACS3D-Net that simultaneously integrates 3D and 2D CNN. Evaluation was on the Expanded BP4D+ database of 200 participants. FACS3D-Net outperformed both 2D CNN and 2D CNN-LSTM approaches. Visualizations of learnt representations suggest that FACS3D-Net is consistent with the spatiotemporal dynamics attended to by human FACS coders. To the best of our knowledge, this is the first work to apply 3D CNN to the problem of AU detection.
more » « less
Full Text Available
Reconsidering the Duchenne Smile: Indicator of Positive Emotion or Artifact of Smile Intensity?

https://doi.org/10.1109/ACII.2019.8925535

Girard, Jeffrey M.; Shandar, Gayatri; Liu, Zhun; Cohn, Jeffrey F.; Yin, Lijun; Morency, Louis-Philippe (September 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII))

The Duchenne smile hypothesis is that smiles that include eye constriction (AU6) are the product of genuine positive emotion, whereas smiles that do not are either falsified or related to negative emotion. This hypothesis has become very influential and is often used in scientific and applied settings to justify the inference that a smile is either true or false. However, empirical support for this hypothesis has been equivocal and some researchers have proposed that, rather than being a reliable indicator of positive emotion, AU6 may just be an artifact produced by intense smiles. Initial support for this proposal has been found when comparing smiles related to genuine and feigned positive emotion; however, it has not yet been examined when comparing smiles related to genuine positive and negative emotion. The current study addressed this gap in the literature by examining spontaneous smiles from 136 participants during the elicitation of amusement, embarrassment, fear, and pain (from the BP4D+ dataset). Bayesian multilevel regression models were used to quantify the associations between AU6 and self-reported amusement while controlling for smile intensity. Models were estimated to infer amusement from AU6 and to explain the intensity of AU6 using amusement. In both cases, controlling for smile intensity substantially reduced the hypothesized association, whereas the effect of smile intensity itself was quite large and reliable. These results provide further evidence that the Duchenne smile is likely an artifact of smile intensity rather than a reliable and unique indicator of genuine positive emotion.
more » « less
Full Text Available

Search for: All records